Exploratory Data Analysis

Kimlee Chea

10/24/2021

Dataset

1. What is this data about?

This dataset was developed to answer ongoing concerns of vaccine hesitancy for COVID-19 by county and local estimations.

This dataset contains summary and detailed records of U.S. residents’ intentions to receive the COVID-19 vaccination with sociodemographic and geographic information.

2. Who produced this data set?

This dataset was obtained from the data catalog hosted on https://www.data.gov/ and was published by the Centers for Disease Control and Prevention (CDC) in coordination with the Office of the Assistant Secretary for Planning and Evaluation (ASPE) and U.S. Department of Health and Human Services (HHS).

The data was constructed with federal survey data provided by the U.S. Census Bureau, specifically the Household Pulse Survey (HPS) from May 26 to June 7, 2021. The HPS gave state level structure to the data. To predict hesitancy rates at the county and local level, this dataset utilizes the Public Use Microdata Areas (PUMA) and 2019 American Community Survey (ACS) to create estimates apportioned across counties and local communities.

3. Why are you interested in it?

The COVID-19 pandemic continues as a public health issue and concerns are rising over the intervention of an Anti-Vaxx movement slowing down community immunization. Outside of the movement, hesitancy to receive the COVID-19 vaccination in the general U.S. population seems to be consistently popular despite government mandates. The U.S. public is witnessing the lasting implications of the COVID-19 pandemic as it impacts social and economic infrastructures.

Initial Analysis Questions

What proportions of the U.S. population are hesitant to receive the COVID-19 vaccine?

Why are people hesitant to receive the COVID-19 vaccine?

Is there a connection between vaccination rates and vaccine hesitancy?

What social demographics are indicators of vaccine hesitancy in a population?

How does hesitancy contribute to the attrition of COVID-19 and variant spread?

Discoveries & Insights

Estimated Hesitancy By State

Hesitancy Defined

Estimated hesitancy as recorded in the dataset is a measure of the percentage of population predicted as surveyed to show resistance or hesitancy to taking the COVID-19 vaccine. This calculation is based on many variables represented by summary statistics such as the SVI or CVAC Level of Concern. These factors will be explored in later visualizations.

In the above bar chart, an observation is made on the calculated average estimated hesitancy by state as a sum of its counties. Analysis of this data shows peak hesitancies as high as ~25% and as low as ~5%. This indicates that as many as a quarter surveyed in the state of Montana and a fifth of those in states such as Alaska, Arkansas or Wyoming are estimated to be hesitant towards receiving the COVID-19 vaccinations.

SVI Across States

What is the SVI?

The Social Vulnerability Index is a CDC and U.S. Census sponsored methodology that combines categorical rankings with census variables. In development of plans that necessitate the allocation of emergency resources, the SVI gives insights by community in the case of an event that endangers the health of the region. The SVI also includes tools to estimate population and geographical factors.

There are four categories of overall vulnerability and each grouping has designated factors relative to those themes. The categories and factors are as follows (15 in total):

  • Socioeconomic Status
    • Below Poverty
    • Unemployed
    • Income
    • No High School Diploma
  • Household Composition & Disability
    • Aged 65 or Older
    • Aged 17 or Younger
    • Older Than Age 5 with a Disability
    • Single-Parent Households
  • Minority Status & Language
    • Minority
    • Speak English “Less than Well”
  • Housing & Transportation
    • Multi-Unit Structures
    • Mobile Homes
    • Crowding
    • No Vehicle
    • Group Quarters

In this case, an overall SVI ranking is used on a county-by-county basis. A percentile ranking is given that represents the proportion of counties (or tracts as they are referred to in SVI guidelines) that are equal or lower than a county of interest in social vulnerability. For example, an SVI ranking of 0.60 suggests that 60% of counties in the state/nation are less vulnerable than that county of interest and that 40% of counties in the state/nation are more vulnerable. Generally, the higher the SVI percentile, the more vulnerable (as enumerated by the 15 factors) that county is in comparison to the scope (in this case, the nation) of all other counties.

More Information can be found here: https://svi.cdc.gov/Documents/Publications/CDC_ATSDR_SVI_Materials/SVI_Poster_07032014_FINAL.pdf -or- https://www.atsdr.cdc.gov/placeandhealth/svi/index.html

In the above graphic, the average SVI is calculated for each state indicating that those states that are warmer in color, flush top left of the graphic, and larger in scale contains a majority of counties that are higher on the SVI ranking for the nation.

CVAC Level of Concern

What is the CVAC?

The Surgo COVID-19 Vaccine Coverage Index (CVAC) is used to measure the logistical impact of a list of constraints that COVID-19 vaccine coverage may encounter. The list is as follows: historic undervaccination, sociodemographic barriers, resource-constrained healthcare system, healthcare accessibility barriers, and irregular care-seeking behaviors. CVAC is scaled similarly to the SVI. CVAC level of concern for difficulty in rollout ranges from 0 to 1 with 1 indicating the highest concern. The CVAC is also categorized in the following increments: Very Low (0.0-0.19), Low (0.20-0.39), Moderate (0.40-0.59), High (0.60-0.79), or Very High (0.80-1.0) Concern.

In the above chloropleth map, darker regions indicate higher values in CVAC level of concern. Here we can see high CVAC accumulations in the U.S. Pacific Southwest, Gulf, and American South regions as well as Alaska. Low CVAC levels are observed in the Northeast and Great Plains/Great Lakes regions. California, Cascadia, and the Pacific regions show mid levels in CVAC level of concern.

A Closer Look at SVI

In the above map, we can observe similar structures in SVI as we do in the previous CVAC graphic. We can draw a relationship between CVAC and SVI considering the visual similarities in both geographic maps. One insight that we can propose is that social vulnerability as indexed according to the 15 categorical factors may have similar impact to communities as the categorical factors outlined in the CVAC with the same issues persisting in the national relief of COVID-19 and the distribution of the COVID-19 vaccine.

Mapping Percent Vaccinated

Nulls in Data

Here, we encounter the first instance of nulls in the dataset. Vaccination data for percent adults is null for the states of Texas and Hawaii and select counties in California, Georgia, and West Virginia. Data and research does not seem to indicate a loss of data or errors in reporting. Differences in eligibility criteria for both vaccine doses may have resulted in nulls in vaccination reporting for adults in these areas.

In the above map, we observe geographic distributions of percent fully vaccinated adults (first + second dose). In virtually every region of the U.S. (save a few counties), there is a remarkably and consistently low percentage of fully vaccinated adults as of 6/10/2021. Less than a third in many tracts across the U.S., here we observe the persisting rollout of vaccination. In the American South, we also observe significantly lower percentages as opposed to the rest of the nation which is consistent with the CVAC visualization. Questions of availability versus hesitancy start to come into play.

A Deeper Look Into Hesitancy

In the above graphic, we analyze the percent estimated hesitancy in all US counties. Regions in the U.S. South, Pacific Southwest, Dust Plains, Great Lakes, and Alaska show higher levels of estimated hesitancy with counties in the states of Alaska and Montana indicating the highest estimates at or around ~27%. Regions in the U.S. Pacific, Hawaii, Cascadia, and Northeast indicate lower estimated hesitancy.

Mapping Hesitant/Unsure

In the above U.S. geographical map, we unsurprisingly observe similar structure to the distribution of percent estimated hesitant or unsure as we did in the previous graphic of percent estimated hesitant. However, here we observe a higher maximum percentage in general. In addition to hesitancy, the dataset expands the scope to include those surveyed as unsure, indicating that hesitancy across the board may lean closer towards these percentages.

Mapping Strongly Hesitant

In the above graphic, we observe percent estimated strongly hesitant by county. Once again, we encounter a similar structure and distribution as previous graphics, but with a lower maximum percentage for estimated strongly hesitant.

Vaccinated vs. Hesitancy

In the above area chart, we analyze the average percent adults fully vaccinated versus average estimated hesitancy/hesitant or unsure/strongly hesitant per state. The chart indicates that the percent adults fully vaccinated is higher in states with low averages in the three categories of estimated hesitancy. This suggests that the percentage of adults fully vaccinated in each state is impacted by estimated hesitancy or, at least, estimated hesitancy plays a role in percent adults fully vaccinated. In the very least, it is telling that the states with high estimated hesitancy have a much lower average percentage of adults fully vaccinated. Therefore, we can conclude that in these states there is a factor of estimated hesitancy that is contributing to the attrition of COVID-19 spread and relief that lies outside of the logistical means of the U.S. public health infrastructure.

Interpreting Ethnic Demographics

In the above layered chart, we observe percent ethnicity as surveyed in the data set as bars with an area chart superimposed plotting the average estimated hesitancy per state. This chart shows the diversity levels of each state as sampled in the dataset with some states showing more diversity in ethnic demography. However, there is no consistent pattern that suggests a connection between the sample of ethnic demography samples and percent estimated hesitancy in a population. There are areas that show low diversity in ethnic demographics and high estimated hesitancy, but this is contradicted by states that show average or higher levels of diversity and moderately high average percentages in estimated hesitancy. Overall, the answer to the initial question of ethnicity being a factor in sociodemographic indicators for estimated hesitancy is inconclusive at best.

Summary

Vaccine hesitancy for COVID-19 has by large become an exceedingly expressive issue in today’s social climate that holds a startling contrast to eras of global pandemics in the past. The issues we face today are evolving atypically than what the U.S. has observed before. Measuring estimated hesitancy is that starting point to addressing these concerns. With assistance by organizations such as the CDC and HPS and the summary statistics that they have developed, hopefully we can reach the roots of these problems to end the exhaustion of the COVID-19 social and economic forecast and future public health emergencies.

Lessons Learned

My experience with Tableau was satisfying. There were many ways to interact with the data, especially with geospatial data. Also, there were instances of having to create new calculated measures to iron out some of the quirks involved in mapping geospatial data. The aspect of data wrangling and transformation to satisfy the initial questions were apparent as I dived deeper into the dataset. The various relationships and structures involved in the creation of this particular dataset was made clear to me as well. Merging all the components that led to the construction of this dataset to give it meaning, shape and form was enlightening. I soon discovered the many metrics and variables involved with visualizing the scope of large data and fulfilling the goals of answering the keystone questions that paved the creation of datasets like these in the first play. All things considered, this exploration was challenging and a self-involving experience that was made whole with the use of Tableau.